DiscoverAI DailyHyenaDNA | Open-Source StyleDrop | Data Poisoning
HyenaDNA | Open-Source StyleDrop | Data Poisoning

HyenaDNA | Open-Source StyleDrop | Data Poisoning

Update: 2023-07-05
Share

Description

Welcome back to AI Daily! Today we discuss three great stories, starting with HyenaDNA. The application of the hyena model in DNA sequencing - enabling models to handle a million context length and revolutionizing our understanding of genomics. Secondly, we cover the exciting open-source implementation of StyleDrop - a tool that's making waves in the world of image editing and style replacement. Finally, we delve into the topic of data poisoning - how a small amount of injected data can drastically alter the outcome of an instruction tuning and the implications this has for AI security.

Key Points:

1️⃣ HyenaDNA

* HyenaDNA utilizes sub-quadratic scaling for DNA sequences, enabling a million context length, each a unique nucleotide, trained on 3 trillion tokens.

* HyenaDNA, setting a new state-of-the-art in genomics benchmarks, could predict gene expression changes, elucidating protein creation from genetic polymorphisms.

* It's 160 times faster than previous LLMs, fitting on a single CoLab, showcasing the potential to outperform transformers and attention models.

2️⃣ Open-Source StyleDrop

* An open-source version of Style Drop, an image editing and style replacing tool, has been implemented and made available for public use.

* Style Drop outperforms comparable models and offers comprehensive instructions for setup, allowing users to experiment with stylizing lettering and more.

* Following a pattern set by Dream Booth, Style Drop went from being a Google research paper to being implemented as an open-source project on GitHub.

3️⃣ Data Poisoning

* Two papers discuss data poisoning, a technique where information like ads or SEO can be injected into LLMs, impacting their responses and recommendations.

* Even a small number of examples in a dataset can effectively "poison" it, significantly altering the output of a language model during fine tuning.

* This technique is expected to be used with open-source datasets for fine-tuning, similar to how publishers put fake words in dictionaries to trace usage.

🔗 Episode Links

* HyenaDNA

* StyleDop

* Data Poisoning

* OpenAI

Follow us on Twitter:

* AI Daily

* Farb

* Ethan

* Conner

Subscribe to our Substack:

* Subscribe



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aidailypod.com
Comments 
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

HyenaDNA | Open-Source StyleDrop | Data Poisoning

HyenaDNA | Open-Source StyleDrop | Data Poisoning

AI Daily